3-D Face Point Trajectory Synthesis Using An Automatically Derived Visual Phoneme Similarity Matrix
نویسندگان
چکیده
This paper presents a novel algorithm which generates three-dimensional face point trajectories for a given speech le with or without its text. The proposed algorithm rst employs an o -line training phase. In this phase, recorded face point trajectories along with their speech data and phonetic labels are used to generate phonetic codebooks. These codebooks consist of both acoustic and visual features. Acoustics are represented by line spectral frequencies (LSF), and face points are represented with their principal components (PC). During the synthesis stage, speech input is rated in terms of its similarity to the codebook entries. Based on the similarity, each codebook entry is assigned a weighting coe cient. If the phonetic information about the test speech is available, this is utilized in restricting the codebook search to only several codebook entries which are visually closest to the current phoneme (a visual phoneme similarity matrix is generated for this purpose). Then these weights are used to synthesize the principal components of the face point trajectory. The performance of the algorithm is tested on held-out data, and the synthesized face point trajectories showed a correlation of 0.73 with true face point trajectories.
منابع مشابه
Speech driven 3-d face point trajectory synthesis algorithm
This paper presents a novel algorithm which generates three-dimensional face point trajectories for a given speech le with or without its text. The proposed algorithm rst employs an o -line training phase. In this phase, recorded face point trajectories along with their speech data and phonetic labels are used to generate phonetic codebooks. These codebooks consist of both acoustic and visual f...
متن کاملCodebook Based Face Point Trajectory Synthesis Algo - rithm Using Speech
This paper presents a novel algorithm which generates three-dimensional face point trajectories for a given speech le with or without its text. The proposed algorithm rst employs an oo-line training phase. In this phase, recorded face point trajectories along with their speech data and phonetic labels are used to generate phonetic codebooks. These codebooks consist of both acoustic and visual f...
متن کاملA New Vision-Based and GPS-Signal-Independent Approach in Jamming Detection and UAV Absolute Positioning Assessment
The Unmanned Aerial Vehicles (UAV) positioning in the outdoor environment is usually done by the Global Positioning System (GPS). Due to the low power of the GPS signal at the earth surface, its performance disrupted in the contaminated environments with the jamming attacks. The UAV positioning and its accuracy using GPS will be degraded in the jamming attacks. A positioning error about tens of...
متن کاملPhoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
Long audio alignment systems for Spanish and English are presented, within an automatic subtitling application. Language-specific phone decoders automatically recognize audio contents at phoneme level. At the same time, language-dependent grapheme-to-phoneme modules perform a transcription of the script for the audio. A dynamic programming algorithm (Hirschberg's algorithm) finds matches betwee...
متن کاملAdditional use of phoneme duration hypotheses in automatic speech segmentation
In this paper, we describe a new approach for speaker independent automatic phoneme alignment. Typical algorithms for this task use only phoneme-to-frame similarity measures which are somehow maximised or minimised. In addition to such similarity measures, we use phoneme duration hypotheses generated by the speech synthesis system HADIFIX [1]. For algorithms based on dynamic programming, it is ...
متن کامل